Subspace modeling and selection for noisy speech recognition

نویسندگان

Jen-Tzung Chien

Chuan-Wei Ting

چکیده

This paper presents a new subspace modeling and selection approach for noisy speech recognition. In subspace modeling, we develop factor analysis (FA) for representing noisy speech. FA is a data generation model where the common factors are extracted with factor loading matrix and specific factors. We bridge the connection of FA to signal subspace (SS) approach. Interestingly, FA partitions noisy speech space into a principal subspace containing speech and noise and a minor subspace containing residual speech and residual noise. To estimate clean speech, we minimize the energies of speech distortion in principal subspace as well as minor subspace. More importantly, in subspace selection, we explore optimal subspace partition via solving hypothesis test problems. We test the equivalence of eigenvalues in minor subspace so as to determine subspace dimension. To fulfill FA spirit, we further examine the hypothesis of uncorrelated residual speech. Optimal solutions are realized through likelihood ratio test with the approximated chi-square distributions as test statistics. Subspace partition is performed according to the confidence towards rejecting null hypotheses. In the experiments on Aurora2 database, FA outperforms SS in subspace modeling. New selection algorithms effectively determine subspace dimension for noisy speech recognition.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition

Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...

متن کامل

Speech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions

Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...

متن کامل

Speech Enhancement Through an Optimized Subspace Division Technique

The speech enhancement techniques are often employed to improve the quality and intelligibility of the noisy speech signals. This paper discusses a novel technique for speech enhancement which is based on Singular Value Decomposition. This implementation utilizes a Genetic Algorithm based optimization method for reducing the effects of environmental noises from the singular vectors as well as t...

متن کامل

Single channel speech enhancement using principal component analysis and MDL subspace selection

We present in this paper a novel subspace approach for single channel speech enhancement and speech recognition in highly noisy environments. Our algorithm is based on principal component analysis and the optimal subspace selection is provided by a minimum description length criterion. This choice overcomes the limitations encountered with other selection criteria, like the overestimation of th...

متن کامل

Speech Enhancement Through an Optimized Subspace Division Technique

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2006

Subspace modeling and selection for noisy speech recognition

نویسندگان

چکیده

منابع مشابه

An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition

Speech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions

Speech Enhancement Through an Optimized Subspace Division Technique

Single channel speech enhancement using principal component analysis and MDL subspace selection

Speech Enhancement Through an Optimized Subspace Division Technique

عنوان ژورنال:

اشتراک گذاری